Mean Field Residual Networks: On the Edge of Chaos

نویسندگان

  • Ge Yang
  • Samuel S. Schoenholz
چکیده

We study randomly initialized residual networks using mean field theory and the theory of difference equations. Classical feedforward neural networks, such as those with tanh activations, exhibit exponential behavior on the average when propagating inputs forward or gradients backward. The exponential forward dynamics causes rapid collapsing of the input space geometry, while the exponential backward dynamics causes drastic vanishing or exploding gradients. We show, in contrast, that by adding skip connections, the network will, depending on the nonlinearity, adopt subexponential forward and backward dynamics, and in many cases in fact polynomial. The exponents of these polynomials are obtained through analytic methods and proved and verified empirically to be correct. In terms of the “edge of chaos” hypothesis, these subexponential and polynomial laws allow residual networks to “hover over the boundary between stability and chaos,” thus preserving the geometry of the input space and the gradient information flow. In our experiments, for each activation function we study here, we initialize residual networks with different hyperparameters and train them on MNIST. Remarkably, our initialization time theory can accurately predict test time performance of these networks, by tracking either the expected amount of gradient explosion or the expected squared distance between the images of two input vectors. Importantly, we show, theoretically as well as empirically, that common initializations such as the Xavier or the He schemes are not optimal for residual networks, because the optimal initialization variances depend on the depth. Finally, we have made mathematical contributions by deriving several new identities for the kernels of powers of ReLU functions by relating them to the zeroth Bessel function of the second kind.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Deep Information Propagation

We study the behavior of untrained neural networks whose weights and biases are randomly distributed using mean field theory. We show the existence of depth scales that naturally limit the maximum depth of signal propagation through these random networks. Our main practical result is to show that random networks may be trained precisely when information can travel through them. Thus, the depth ...

متن کامل

Edge detection in gravity field of the Gheshm sedimentary basin

Edge detection and edge enhancement techniques play an essential role in interpreting potential field data. This paper describes the application of various edge detection techniques to gravity data in order to delineate the edges of subsurface structures. The edge detection methods comprise analytic signal, total horizontal derivative (THDR), theta angle, tilt angle, hyperbolic of tilt angle (H...

متن کامل

Edge-tenacity in Networks

Numerous networks as, for example, road networks, electrical networks and communication networks can be modeled by a graph. Many attempts have been made to determine how well such a network is "connected" or stated differently how much effort is required to break down communication in the system between at least some nodes. Two well-known measures that indicate how "reliable" a graph is are the...

متن کامل

A Survey on Complexity of Integrity Parameter

Many graph theoretical parameters have been used to describe the vulnerability of communication networks, including toughness, binding number, rate of disruption, neighbor-connectivity, integrity, mean integrity, edgeconnectivity vector, l-connectivity and tenacity. In this paper we discuss Integrity and its properties in vulnerability calculation. The integrity of a graph G, I(G), is defined t...

متن کامل

Measuring Mutual Information in Random Boolean Networks

During the last few years an area of active research in the field of complex systems is that of their information storing and processing abilities. Common opinion has it that the most interesting beaviour of these systems is found “at the edge of chaos”, which would seem to suggest that complex systems may have inherently non-trivial information proccesing abilities in the vicinity of sharp pha...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2017